Feature Recognition with Minimal Assumptions by Neural Self-organisation

نویسنده

  • CHRIS J S WEBBER
چکیده

A generic approach to perceptual pattern-recognition and data fusion is presented. Unsupervised neural self-organisation is used to discover and encode the component features of the training data, so that those components are subsequently detected with generalisation and discrimination. The approach is designed to be used in hierarchical networks to encode structures at many levels of abstraction. Components are extracted on the basis of the statistical dependences within the data rather than prior assumptions about the data: properties such as size-scales are deduced from the ranges of statistical dependences. The pattern-recognition functions of generalisation and discrimination emerge from the self-organisation process, without having been optimised explicitly. Neurons acquire unknown transformation invariances through a spontaneous symmetry-breaking mechanism, which allows weight-vector symmetries to emerge in a data-driven way. The principles are demonstrated using a handwriting-image database and a continuous-speech database. In the first case, neurons develop into detectors for localised lineand curve-segments. In the second case, the network develops into a detector for the phones and words encountered in the training data. The network is given no prior knowledge of these components. 1. SELF-ORGANISATION IS TO BE USED WHERE GUESSING THE INPUTOUTPUT MAPPING IS INCONCEIVABLE Perceptual pattern-recognition problems require interpretations to be made on the basis of mega-dimensional data having complicated underlying structure with degrees of freedom of unknown topology. Transformations that could simplify this structure (making its invariances manifest) are unknown, and probably require a hierarchy of many levels of feature-extraction. Engineering hard-wired solutions to such pattern-recognition problems is inconceivable. Hard-wired solutions would require the required input-output mapping to be guessed at the design stage. Second International Symposium on Soft Computing for Industry, Anchorage USA, May 1998 2 An alternative approach is to use unsupervised neural networks to discover as much structure as can be extracted, and re-code the data in terms of that structure. Non-linear transformations of this kind can be repeated again and again in a hierarchical way, building up increasingly abstract interpretations as more and more subtle variability is removed from the data. 2. STRUCTURE-FINDING AND PATTERN-RECOGNITION Although unimaginably complicated, perceptual data are far from random. With the benefit of our natural-selected pattern-recognition machinery, we interpret the perceptual world as being made up of features and objects, which are typically composites with their own internal sub-structure and variability: the degrees of freedom in perceptual data are hierarchically structured. One of the most sought-after goals of unsupervised neural networks is componential coding. Various other (effectively) synonymous terms are used in the literature: factorial, sparse, or multiple-cause coding. The basic idea is to extract relatively immutable component sub-structures, which arise repeatedly in the different patterns of a data-set. It amounts to finding an optimal ‘alphabet’ of symbols with which to describe the data and its variability. Extraction is done on the basis of statistical dependences (‘higher-order correlations’) between the pattern-vectors’ coordinates (pixels): one should not impose prior assumptions about the nature of the components, or constrain the coordinatesubspaces inside which the components are to be found. For example, it should not be assumed that the components will have a characteristic size-scale, or even that they are localised at all. Componential coding is an inherently non-linear transformation, and extracts much more subtle structure than easily understood linear methods like principal component analysis (PCA). For example, PCA could never extract localised components from translationally invariant image-statistics (because eigenvectors of Toeplitz matrices are periodic, not localised). Although componential coding sounds remarkably simple, only very simple components (wavelets) have been extracted from natural (non-synthetic) data, and only on a very small scale [1] . Componential coding has been conclusively demonstrated only for very simple, small-scale synthetic data, e.g. [2] . Here, we demonstrate some more convincing examples using natural data. Deriving components from data without making data-specific assumptions allows us to make an objective attempt at addressing the question “What are features?”. If they emerge from a self-organisation process without prior assumptions, then they are the basis-vectors for an optimal code, and they can be defined with reference to the objective function that has been optimised. Their properties (such as size-scales) can be derived from the data’s statistics by the self-organisation process. The difficult part is choosing an objective function that can give rise to assumptionless component-discovery. Self-organised componential coding results in the generation of a set of featuredetectors optimally matched to the data. An important motivation for encoding the data componentially is to recognise components with generalisation and discrimination: an adapted neuron will respond to any pattern containing its particular ‘trigger’ component, but will respond to no pattern from which its trigger component is absent. An example of Second International Symposium on Soft Computing for Industry, Anchorage USA, May 1998 3 emergent generalisation and discrimination in a self-organised componential network is demonstrated in this paper using continuous speech data. 3. DOES THIS PHILOSOPHY WORK WITH REAL DATA? Our first demonstration of componential coding uses as its training set the 9,000 512x256-pixel grey-level images of the CEDAR handwriting database. This training set is applied to a ‘neuron gas’ of 2,097,152 neurons (16x512x256), each of which receives input from the whole image, via its 131,072-dimensional weight-vector. Each neuron computes as its output a (softened) semi-linear function r w x ( ) ⋅ of its weight-input scalar product [Figure 1]. The form of non-linearity r can be related analytically to component-finding behaviour [3] . It is imagined that the outputs regenerate an estimate E of the input x: E a wr w x i i i = ⋅ = ∑ ( ) 1 2,097,152 : this reconstruction would be obtained by Bayesian folded Markov chain on the basis of a temporally long sequence of individual neural firing-events of rates r w x i ( ) ⋅ [4] . The adaptive parameters are optimised in an unsupervised way (w by gradient descent and a by matrix inversion [4] ), to minimise the average reconstruction error | | E x − 2 , so preserving as much as possible of the data’s variability through the neural code. Figure 2 depicts the 16 distinct weight-vectors after self-organisation to encode the variability of the handwriting database. (The weight-vectors have adapted from initially random white-noise configurations.) Even though no prior constraint was imposed to select for localised features, the self-organisation has chosen to encode the variability of the data in terms of localised components: lineand curve-segments. The scale-lengths of the features found are determined purely by the ranges of statistical dependences between the image-pixels. A curve-segment code has been chosen as an optimal alphabet for handwriting: the self-organisation has discovered that lineand curve-segments are its basic components. The requirement that the different neurons cooperate to encode the data has encouraged them to develop complementary feature-detection functions: one detects horizontal segments, another vertical segments, and so on. Figure 1. Single-layer network of semi-linear neurons (a ‘neuron gas’ without topographical ordering), whose inputs span an image Second International Symposium on Soft Computing for Industry, Anchorage USA, May 1998 4 A multi-level network would be necessary to combine the lines and curves into more complex structures: the components of a higher-level, more abstract representation. 4. MULTI-STAGE PATTERN-RECOGNITION NETWORKS To illustrate the building of more abstract representations in multi-level networks, it is convenient to turn to speech data [Figure 3]. Unlike hidden Markov model speechrecognisers, the unsupervised componential network demonstrated here [Figure 4] treats spectrograms similarly to images, using the same semi-linear componential neurons as before. The higher layer not only optimises its weight-vector, but also optimises its input, by sending feedback signals to coordinate the adaptation of neurons in the lower layer. Figure 2. The 16 distinct weight-vectors after self-organisation to encode the handwriting database Figure 3. Different spectrograms from a continuous-speech database, which happen to share a common component: the word “eight” Second International Symposium on Soft Computing for Industry, Anchorage USA, May 1998 5 Figure 5 depicts the self-organisation of this simple two-level network into a detector for the word “six”, a component encountered frequently in its continuous training speech [5] . Other paths to convergence generate detectors for other words, and phones that arise in a variety of different word-contexts. The self-organisation has discovered that phones and words are the components of continuous speech, given no prior knowledge of the duration or even the existence of such structures. Generalisation and discrimination can self-organise as emergent functions, without having been optimised explicitly. In these speech demonstrations, the self-organisation has indeed given rise to generalisation and discrimination, as can be seen graphically from Figure 6. Measures of generalisation and discrimination for word-detectors, obtained by comparison with an unseen labelled test-set, are respectively around 70% and 80% [5] . These figures do not compete with well-tuned supervised speech-recognition methods, but the network demonstrated here is perhaps the simplest that could illustrate the principle, and is not tuned at all. Generalisation and discrimination are optimised explicitly in supervised methods. Their emergence as by-products of self-organised componential coding has never before been demonstrated. Figure 4. An unsupervised network that builds a two-level componential representation of continuous speech. The entire network self-organises to function as a phoneor word-detector. Second International Symposium on Soft Computing for Industry, Anchorage USA, May 1998 6 Figure 5. The gradual self-organisation of a detector for the word “six” (bottom), from a random initial state (top). On the left, the higher-level weight-vector is seen to become localised, having started from uniform initial connectivity. Second International Symposium on Soft Computing for Industry, Anchorage USA, May 1998 7 5. SELF-ORGANISED TRANSFORMATION-INVARIANCE In the spontaneous symmetry-breaking mechanism mentioned in the abstract, the weight-vector of the neuron in its convergent state preserves some symmetries (transformation-invariances) of the dynamical equations for learning, and breaks other symmetries. Through self-organisation, the neuron’s recognition-response will become invariant under any transformation that is a preserved symmetry. Since the symmetries of the learning equations are determined by the data’s probability distribution (because the learning depends on the training data), this symmetry-preservation allows transformationinvariant recognition (generalisation) to self-organise to match symmetries in the data. Where symmetries are broken, discrimination self-organises instead. The key to getting transformation-invariant generalisation to self-organise is to design an objective function that doesn’t break too many dynamical symmetries [6] . In hierarchical networks, the spontaneous symmetry-breaking mechanism could allow generalisation and discrimination, over transformations such as shift and deformation in composite objects, to self-organise. This behaviour can occur in very simple, biologically plausible, neurons: in particular, it can be proven analytically that the particular form of threshold neuron used in these demonstrations preserves symmetries [6] . Spontaneous symmetry-breaking is a systematic approach to transformation-invariant recognition, which could be applied to transformations as general as shape deformation, local shift, or spoken vowel variability. It can occur at any level of a hierarchical componential network, and is relevant for features at any level of abstraction. 50 100 150 200 250 shift T 2 4 6 8 10 m(x(T),w) x 100 Figure 6. The response of a self-organised “two”-detector as an unseen test recording “two three two” (depicted underneath) passes across its receptive field. The network’s detection response is plotted as a function of time: it rises above threshold for both “two”s in the recording but not for the “three”, illustrating generalisation and discrimination. Second International Symposium on Soft Computing for Industry, Anchorage USA, May 1998 8 Suitable componential codes can re-cast all these different kinds of transformations as permutations among neurons representing context-related components [6] . For example, shifting a component from location A to location B will swap over the output-values of two component-detectors, one centred on location A and the other on B [lower layer of neurons in Figure 7]. If these component-detectors feed a neuron in a higher level, through weights of equal values, the response of that higher-level neuron will remain invariant to the permutation-transformation, and hence invariant with respect to the location of the component. Symmetry-preserving neural learning dynamics generate convergent weight-vectors that exhibit this kind of self-organised symmetry between weight-values [6] , and hence give rise to self-organised transformation-invariances. 6. WHERE ARE THE MINIMAL ASSUMPTIONS? The philosophy of imposing as few assumptions as possible leaves the self-organisation sufficiently unconstrained to extract complicated and meaningful structure from the data. Some assumptions have to be made, however: you don’t get something for nothing. All the assumptions are made in defining the right objective function, and all the rest is derived from the data: the components and their properties, the functions of generalisation and discrimination, and transformation-invariances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

Pattern Recognition in Control Chart Using Neural Network based on a New Statistical Feature

Today for the expedition of the identification and timely correction of process deviations, it is necessary to use advanced techniques to minimize the costs of production of defective products. In this way control charts as one of the important tools for the statistical process control in combination with modern tools such as artificial neural networks have been used. The artificial neural netw...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

On the use of Textural Features and Neural Networks for Leaf Recognition

for recognizing various types of plants, so automatic image recognition algorithms can extract to classify plant species and apply these features. Fast and accurate recognition of plants can have a significant impact on biodiversity management and increasing the effectiveness of the studies in this regard. These automatic methods have involved the development of recognition techniques and digi...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998